\documentclass[twocolumn,10pt]{article}
\setlength\textwidth{6.875in}
\setlength\textheight{8.875in}
% set both margins to 2.5 pc
\setlength{\oddsidemargin}{-0.1875in}% 1 - (8.5 - 6.875)/2
\setlength{\evensidemargin}{-0.1875in}
\setlength{\marginparwidth}{0pc}
\setlength{\marginparsep}{0pc}%
\setlength{\topmargin}{0in} \setlength{\headheight}{0pt}
\setlength{\headsep}{0pt}
\setlength{\footskip}{37pt}%
%\setlength{\columnsep}{0.3125in}
%\setlength{\columnwidth}{3.28125in}% (6.875 - 0.3125)/2 = 3.28125in
\setlength{\parindent}{1pc}
\newcommand{\myMargin}{1.00in}
\usepackage[top=\myMargin, left=\myMargin, right=\myMargin, bottom=\myMargin, nohead]{geometry}
\usepackage{epsfig,graphicx}
\usepackage{palatino}
\usepackage{fancybox}
\usepackage{url}
\usepackage[procnames]{listings}

\input{../style/scala.tex}

\lstset{frame=, basicstyle={\footnotesize\ttfamily}}

\newcommand{\todo}[1]{\emph{TODO: #1}}
\newcommand{\comment}[1]{\emph{Comment: #1}}

% uncomment following for final submission
\renewcommand{\todo}[1]{}
\renewcommand{\comment}[1]{}

\newenvironment{commentary}
{ \vspace{-0.1in}
  \begin{quotation}
  \noindent
  \small \em
  \rule{\linewidth}{1pt}\\
}
{
  \end{quotation}
}

% \newenvironment{kode}%
% {\footnotesize
%  %\setlength{\parskip}{0pt}
%   %\setlength{\topsep}{0pt}
%   %\setlength{\partopsep}{0pt}
%  \verbatim}
% {\endverbatim 
% %\vspace*{-0.1in}
%  }

% \newenvironment{kode}%
% {\VerbatimEnvironment
% \footnotesize\begin{Sbox}\begin{minipage}{6in}\begin{Verbatim}}%
% {\end{Verbatim}\end{minipage}\end{Sbox}
% \setlength{\fboxsep}{8pt}\fbox{\TheSbox}}

% \newenvironment{kode}
% {\begin{Sbox}
% \footnotesize
% \begin{minipage}{6in}
%   %\setlength{\parskip}{0pt}
%   %\setlength{\topsep}{0pt}
%   %\setlength{\partopsep}{0pt}
%   \verbatim}
% {\endverbatim 
% \end{minipage}
% \end{Sbox} 
% \fbox{\TheSbox}
%  %\vspace*{-0.1in}
%  }

\title{Chisel 2.2 Tutorial}
\author{Jonathan Bachrach, Krste Asanovi\'{c}, John Wawrzynek \\
EECS Department, UC Berkeley\\
{\tt  \{jrb|krste|johnw\}@eecs.berkeley.edu}
}
\date{\today}

\newenvironment{example}{\VerbatimEnvironment\begin{footnotesize}\begin{Verbatim}}{\end{Verbatim}\end{footnotesize}}
\newcommand{\kode}[1]{\begin{footnotesize}{\tt #1}\end{footnotesize}}

\def\code#1{{\tt #1}}

\def\note#1{\noindent{\bf [Note: #1]}}
%\def\note#1{}

\begin{document}
\maketitle{}

% TODO: default
% TODO: enum yields Bits
% TODO: why hardware construction languages

\section{Introduction}

This document is a tutorial introduction to {\em Chisel} (Constructing
Hardware In a Scala Embedded Language).  Chisel is a hardware
construction language embedded in the high-level programming language
Scala.  At some point we will provide a proper reference manual, in
addition to more tutorial examples.  In the meantime, this document
along with a lot of trial and error should set you on your way to
using Chisel.  Chisel is really only a set of special class
definitions, predefined objects, and usage conventions within Scala,
so when you write a Chisel program you are actually writing a Scala
program.  However, for the tutorial we don't presume that you
understand how to program in Scala.  We will point out necessary Scala
features through the Chisel examples we give, and significant hardware
designs can be completed using only the material contained herein.
But as you gain experience and want to make your code simpler or more
reusable, you will find it important to leverage the underlying power
of the Scala language. We recommend you consult one of the excellent
Scala books to become more expert in Scala programming.

% MS: maybe infancy can now be dropped. Chisel has proven
% to be mature enough for serious designs.
Chisel is still in its infancy and you are likely to encounter some
implementation bugs, and perhaps even a few conceptual design
problems.  However, we are actively fixing and improving the language,
and are open to bug reports and suggestions.  Even in its early state,
we hope Chisel will help designers be more productive in building
designs that are easy to reuse and maintain.

\begin{commentary}
Through the tutorial, we format commentary on our design choices as in
this paragraph.  You should be able to skip the commentary sections
and still fully understand how to use Chisel, but we hope you'll find
them interesting.

We were motivated to develop a new hardware language by years of
struggle with existing hardware description languages in our research
projects and hardware design courses.  Verilog and VHDL were developed
as hardware {\em simulation} languages, and only later did they become
a basis for hardware {\em synthesis}.  Much of the semantics of these
languages are not appropriate for hardware synthesis and, in fact,
many constructs are simply not synthesizable.  Other constructs are
non-intuitive in how they map to hardware implementations, or their
use can accidently lead to highly inefficient hardware structures.
While it is possible to use a subset of these languages and yield
acceptable results, they nonetheless present a cluttered and confusing
specification model, particularly in an instructional setting.

However, our strongest motivation for developing a new hardware
language is our desire to change the way that electronic system design
takes place.  We believe that it is important to not only teach
students how to design circuits, but also to teach them how to design
{\em circuit generators}---programs that automatically generate
designs from a high-level set of design parameters and constraints.
Through circuit generators, we hope to leverage the hard work of
design experts and raise the level of design abstraction for everyone.
To express flexible and scalable circuit construction, circuit
generators must employ sophisticated programming techniques to make
decisions concerning how to best customize their output circuits
according to high-level parameter values and constraints.  While
Verilog and VHDL include some primitive constructs for programmatic
circuit generation, they lack the powerful facilities present in
modern programming languages, such as object-oriented programming,
type inference, support for functional programming, and reflection.

Instead of building a new hardware design language from scratch, we
chose to embed hardware construction primitives within an existing
language.  We picked Scala not only because it includes the
programming features we feel are important for building circuit
generators, but because it was specifically developed as a base for
domain-specific languages.
\end{commentary}

\section{Hardware expressible in Chisel}

% The initial version of Chisel only supports the expression of
% synchronous RTL (Register-Transfer Level) designs, with a single
% common clock.  Synchronous RTL circuits can be expressed as a
% hierarchical composition of modules containing combinational logic and
% clocked state elements.  Although Chisel assumes a single global
% clock, local clock gating logic is automatically generated for every
% state element in the design to save power.
% \begin{commentary}
% Modern hardware designs often include multiple islands of logic, where
% each island uses a different clock and where islands must correctly
% communicate across clock island boundaries.  Although clock-crossing
% synchronization circuits are notoriously difficult to design, there
% are known good solutions for most scenarios, which can be packaged as
% library elements for use by designers.  As a result, most effort in
% new designs is spent in developing and verifying the functionality
% within each synchronous island rather than on passing values between
% islands.
% 
% In its current form, Chisel can be used to describe each of the
% synchronous islands individually. Existing tool frameworks can tie
% together these islands into a complete design.  For example, a
% separate outer simulation framework can be used to model the assembly
% of islands running together.  It should be noted that exhaustive
% dynamic verification of asynchronous communications is usually
% impossible and that more formal static approaches are usually
% necessary.
% \end{commentary}

This version of Chisel only supports binary logic, and does not
support tri-state signals.
\begin{commentary}
We focus on binary logic designs as they constitute the vast majority
of designs in practice.  We omit support for tri-state logic in the
current Chisel language as this is in any case poorly supported by
industry flows, and difficult to use reliably outside of controlled
hard macros.
\end{commentary}

\section{Datatypes in Chisel}

Chisel datatypes are used to specify the type of values held in state
elements or flowing on wires.  While hardware designs ultimately
operate on vectors of binary digits, other more abstract
representations for values allow clearer specifications and help the
tools generate more optimal circuits.  In Chisel, a raw collection of
bits is represented by the \code{Bits} type.  Signed and unsigned integers
are considered subsets of fixed-point numbers and are represented by
types \code{SInt} and \code{UInt} respectively. Signed fixed-point
numbers, including integers, are represented using two's-complement
format.  Boolean values are represented as type \code{Bool}.  Note
that these types are distinct from Scala's builtin types such as
\code{Int} or \code{Boolean}.  Additionally, Chisel defines {\em Bundles} for making
collections of values with named fields (similar to {\em structs} in
other languages), and {\em Vecs} for indexable collections of
values.  Bundles and Vecs will be covered later.

Constant or literal values are expressed using Scala integers or
strings passed to constructors for the types:
\begin{scala}
UInt(1)       // decimal 1-bit lit from Scala Int.
UInt("ha")    // hexadecimal 4-bit lit from string.
UInt("o12")   // octal 4-bit lit from string.
UInt("b1010") // binary 4-bit lit from string.

SInt(5)    // signed decimal 4-bit lit from Scala Int.
SInt(-8)   // negative decimal 4-bit lit from Scala Int.
UInt(5)    // unsigned decimal 3-bit lit from Scala Int.

Bool(true) // Bool lits from Scala lits.
Bool(false)
\end{scala}

Underscores can be used as separators in long string literals to aid
readability, but are ignored when creating the value, e.g.:
\begin{scala}
UInt("h_dead_beef")   // 32-bit lit of type UInt
\end{scala}

By default, the Chisel compiler will size each constant to the minimum
number of bits required to hold the constant, including a sign bit for
signed types.  Bit widths can also be specified explicitly on
literals, as shown below:
\begin{scala}
UInt("ha", 8)     // hexadecimal 8-bit lit of type UInt
UInt("o12", 6)    // octal 6-bit lit of type UInt
UInt("b1010", 12) // binary 12-bit lit of type UInt

SInt(5, 7) // signed decimal 7-bit lit of type SInt
UInt(5, 8) // unsigned decimal 8-bit lit of type UInt
\end{scala}

\noindent
For literals of type \code{UInt}, the value is
zero-extended to the desired bit width.  For literals of type
\code{SInt}, the value is sign-extended to fill the desired bit width.
If the given bit width is too small to hold the argument value, then a
Chisel error is generated.

\begin{commentary}
We are working on a more concise literal syntax for Chisel using
symbolic prefix operators, but are stymied by the limitations of Scala
operator overloading and have not yet settled on a syntax that is
actually more readable than constructors taking strings.

We have also considered allowing Scala literals to be automatically
converted to Chisel types, but this can cause type ambiguity and
requires an additional import.

The SInt and UInt types will also later support an optional exponent
field to allow Chisel to automatically produce optimized fixed-point
arithmetic circuits.
\end{commentary}

\section{Combinational Circuits}

A circuit is represented as a graph of nodes in Chisel.  Each node is
a hardware operator that has zero or more inputs and that drives one
output.  A literal, introduced above, is a degenerate kind of node
that has no inputs and drives a constant value on its output.  One way
to create and wire together nodes is using textual expressions.  For
example, we can express a simple combinational logic circuit
using the following expression:

\begin{scala}
(a & b) | (~c & d)
\end{scala}

The syntax should look familiar, with \code{\&} and \code{|}
representing bitwise-AND and -OR respectively, and \code{\~{}}
representing bitwise-NOT.  The names \code{a} through \code{d}
represent named wires of some (unspecified) width.

Any simple expression can be converted directly into a circuit tree,
with named wires at the leaves and operators forming the internal
nodes.  The final circuit output of the expression is taken from the
operator at the root of the tree, in this example, the bitwise-OR.

Simple expressions can build circuits in the shape of trees, but to
construct circuits in the shape of arbitrary directed acyclic graphs
(DAGs), we need to describe fan-out.  In Chisel, we do this by naming
a wire that holds a subexpression that we can then reference multiple
times in subsequent expressions.  We name a wire in Chisel by
declaring a variable.  For example, consider the select expression,
which is used twice in the following multiplexer description:
\begin{scala}
val sel = a | b
val out = (sel & in1) | (~sel & in0)
\end{scala}

\noindent
The keyword \code{val} is part of Scala, and is used to name variables
that have values that won't change.  It is used here to name the
Chisel wire, \code{sel}, holding the output of the first bitwise-OR
operator so that the output can be used multiple times in the second
expression.

\section{Builtin Operators}

Chisel defines a set of hardware operators for the builtin types shown
in Table~\ref{tbl:chisel-operators}.
\begin{table*}
\begin{center}
\begin{tabular}{|l|l|}
\hline
Example & Explanation \\
\hline
\hline
\multicolumn{2}{|l|}{Bitwise operators.  Valid on SInt, UInt, Bool.} \\
\hline
\hline
\verb!val invertedX = ~x!                    &   Bitwise NOT  \\
\verb!val hiBits = x & UInt("h_ffff_0000") ! &   Bitwise AND  \\
\verb!val flagsOut = flagsIn | overflow !    &   Bitwise OR   \\
\verb!val flagsOut = flagsIn ^ toggle !      &   Bitwise XOR  \\
\hline
\hline
\multicolumn{2}{|l|}{Bitwise reductions.  Valid on SInt and
  UInt.  Returns Bool. } \\
\hline
\hline
\verb!val allSet = andR(x) ! & AND reduction  \\
\verb!val anySet = orR(x)  ! & OR reduction   \\
\verb!val parity = xorR(x) !  & XOR reduction  \\
\hline
\hline
\multicolumn{2}{|l|}{Equality comparison. Valid on SInt, 
UInt, and Bool. Returns Bool.} \\
\hline
\hline
\verb@val equ = x === y@ & Equality \\
\verb@val neq = x =/= y@ & Inequality \\
\hline
\hline
\multicolumn{2}{|l|}{Shifts. Valid on SInt and UInt.} \\
\hline
\hline
\verb@val twoToTheX = SInt(1) << x@  & Logical left shift. \\
\verb@val hiBits = x >> UInt(16)@          & Right shift (logical on UInt and\&
arithmetic on SInt). \\
% \verb@val scaledX = x >>> 3@  & Arithmetic right shift, copies in sign bits. \\
\hline
\hline
\multicolumn{2}{|l|}{Bitfield manipulation.  Valid on SInt, UInt, and Bool. } \\
\hline
\hline
\verb@val xLSB = x(0)@  & Extract single bit, LSB has index 0. \\
\verb@val xTopNibble = x(15,12)@  & Extract bit field  from end to start
bit position. \\
\verb@val usDebt = Fill(3, UInt("hA"))@ & Replicate a bit string multiple times. \\
\verb@val float = Cat(sign,exponent,mantissa)@ & Concatenates bit fields, with first argument on left.\\
\hline
\hline
\multicolumn{2}{|l|}{Logical operations.  Valid on Bools. } \\
\hline
\verb@val sleep = !busy@  & Logical NOT \\
\verb@val hit = tagMatch && valid @  & Logical AND \\
\verb@val stall = src1busy || src2busy@  & Logical OR \\
\verb@val out = Mux(sel, inTrue, inFalse)@  & Two-input mux where sel is a Bool \\ % {\bf Why?} \\
\hline
\hline
\multicolumn{2}{|l|}{Arithmetic operations.  Valid on Nums: SInt and UInt. } \\
\hline
\verb@val sum = a + b@  & Addition \\
\verb@val diff = a - b @  & Subtraction \\
\verb@val prod = a * b @  & Multiplication \\
\verb@val div = a / b @  & Division \\
\verb@val mod = a % b @  & Modulus \\
\hline
\hline
\multicolumn{2}{|l|}{Arithmetic comparisons.  Valid on Nums: SInt and
  UInt. Returns Bool.} \\
\hline
\verb@val gt = a > b@  & Greater than \\
\verb@val gte = a >= b@  & Greater than or equal \\
\verb@val lt = a < b@  & Less than \\
\verb@val lte = a <= b@  & Less than or equal \\
\hline
\end{tabular}
\end{center}
\caption{Chisel operators on builtin data types.}
\label{tbl:chisel-operators}
\end{table*}

\subsection{Bitwidth Inference}

Users are required to set bitwidths of ports and registers, but otherwise,
bit widths on wires are automatically inferred unless set manually by the user.
% TODO: how do you set the width explicitly?
The bit-width inference engine starts from the graph's input ports and 
calculates node output bit widths from their respective input bit widths according to the following set of rules:

\begin{tabular}{ll}
{\bf operation} & {\bf bit width} \\ 
\verb|z = x + y| & \verb|wz = max(wx, wy)| \\
\verb+z = x - y+ & \verb|wz = max(wx, wy)|\\
\verb+z = x & y+ & \verb+wz = min(wx, wy)+ \\
\verb+z = Mux(c, x, y)+ & \verb+wz = max(wx, wy)+ \\
\verb+z = w * y+ & \verb!wz = wx + wy! \\
\verb+z = x << n+ & \verb!wz = wx + maxNum(n)! \\
\verb+z = x >> n+ & \verb+wz = wx - minNum(n)+ \\
\verb+z = Cat(x, y)+ & \verb!wz = wx + wy! \\
\verb+z = Fill(n, x)+ & \verb+wz = wx * maxNum(n)+ \\
% \verb+z = x < y+ & \verb+<= > >= && || != ===+ & \verb+wz = 1+ \\
\end{tabular}

\noindent
where for instance $wz$ is the bit width of wire $z$, and the \verb+&+
rule applies to all bitwise logical operations.

\comment{maxNum and MinNum need to be explained.}

The bit-width inference process continues until no bit width changes.
Except for right shifts by known constant amounts, the bit-width
inference rules specify output bit widths that are never smaller than
the input bit widths, and thus, output bit widths either grow or stay
the same.  Furthermore, the width of a register must be specified by
the user either explicitly or from the bitwidth of the reset value or
the \emph{next} parameter.
From these two requirements, we can show that the bit-width inference
process will converge to a fixpoint.

\begin{commentary}
Our choice of operator names was constrained by the Scala language.
We have to use triple equals \code{===} for equality and \code{=/=}
for inequality to allow the
native Scala equals operator to remain usable.

We are also planning to add further operators that constrain bitwidth
to the larger of the two inputs.
\end{commentary}

\section{Functional Abstraction}

We can define functions to factor out a repeated piece of logic that
we later reuse multiple times in a design.  For example, we can wrap
up our earlier example of a simple combinational logic block as
follows:
\begin{scala}
def clb(a: UInt, b: UInt, c: UInt, d: UInt): UInt = 
  (a & b) | (~c & d)
\end{scala}

\noindent
where \code{clb} is the function which takes \code{a}, \code{b},
\code{c}, \code{d} as arguments and returns a wire to the output of a
boolean circuit.  The \code{def} keyword is part of Scala and
introduces a function definition, with each argument followed by a colon then its type,
and the function return type given after the colon following the
argument list.  The equals (\code{=})
sign separates the function argument list from the function
definition.

We can then use the block in another circuit as follows:
\begin{scala}
val out = clb(a,b,c,d)
\end{scala}

% TODO: SHIFTER DONE FUNCTIONAL WITH LOOP

%% Because Scala has powerful type inference, we can in many cases drop
%% the type declarations on the function:
%% \begin{scala}
%% def clb(a, b, c, d) = (a & b) | (~c & d) // No types needed.

%% def bigblock(a: Bool, b: Bool, c: Bool, d: Bool,
%%              f: UInt, g: UInt, h: UInt, i: UInt): Bool =
%%                  clb(a, b, c, clb(f,g,h,i)!=0) 

%% \end{scala}

%% Here, we use \code{clb} twice.  The inner \verb!clb!  works with
%% fixed-point values to calculate the value of an internal node that is
%% compared with 0 to give a \code{Bool}, while the outer \verb!clb!
%%   works with \code{Bool} values and returns the result of the
%%   function.  Scala will perform type inference statically to check
%%   that there are no type errors.

We will later describe many powerful ways to use functions to
construct hardware using Scala's functional programming support.

\section{Bundles and Vecs}

\code{Bundle} and \code{Vec} are classes that allow the user to expand
the set of Chisel datatypes with aggregates of other types.

Bundles group together several named fields of potentially different
types into a coherent unit, much like a \code{struct} in C. Users
define their own bundles by defining a class as a subclass of
\code{Bundle}:
\begin{scala}
class MyFloat extends Bundle {
  val sign        = Bool()
  val exponent    = UInt(width = 8)
  val significand = UInt(width = 23)
}

val x  = new MyFloat()
val xs = x.sign
\end{scala}

\noindent
A Scala convention is to capitalize the name of new classes and we
suggest you follow that convention in Chisel too.  The \code{width}
named parameter to the \code{UInt} constructor specificies the number
of bits in the type.

Vecs create an indexable vector of elements, and are constructed as
follows:
\begin{scala}
// Vector of 5 23-bit signed integers.
val myVec = Vec.fill(5){ SInt(width = 23) } 

// Connect to one element of vector. 
val reg3  = myVec(3) 
\end{scala}

\noindent
(Note that we have to specify the type of the \code{Vec} elements
inside the trailing curly brackets, as we have to pass the bitwidth
parameter into the \code{SInt} constructor.)

The set of primitive classes
(\code{SInt}, \code{UInt}, and \code{Bool}) plus the aggregate
classes (\code{Bundles} and \code{Vec}s) all inherit from a common
superclass, \code{Data}.  Every object that ultimately inherits from
\code{Data} can be represented as a bit vector in a hardware design.

Bundles and Vecs can be arbitrarily nested to build complex data
structures:
\begin{scala}
class BigBundle extends Bundle {
 // Vector of 5 23-bit signed integers.
 val myVec = Vec.fill(5) { SInt(width = 23) } 
 val flag  = Bool()
 // Previously defined bundle.
 val f     = new MyFloat()              
}
\end{scala}

\noindent
Note that the builtin Chisel primitive and aggregate classes do not
require the \code{new} when creating an instance, whereas new user
datatypes will.  A Scala \code{apply} constructor can be defined so
that a user datatype also does not require \code{new}, as described in
Section~\ref{sec:funconstructor}.

\section{Ports}

Ports are used as interfaces to hardware components.  A port is simply
any \code{Data} object that has directions assigned to its members.

Chisel provides port constructors to allow a direction to be added
(input or output) to an object at construction time.  Primitive port
constructors take the direction as the first
argument (where the direction is \code{INPUT} or
\code{OUTPUT}) and the number of bits as the second argument (except
booleans which are always one bit).

An example port declaration is as follows:
\begin{scala}
class Decoupled extends Bundle {
  val ready = Bool(OUTPUT)
  val data  = UInt(INPUT, 32)
  val valid = Bool(INPUT)
}
\end{scala}

\noindent
After defining \code{Decoupled}, it becomes a new type that can be
used as needed for module interfaces or for named collections of
wires.

The direction of an object can also be assigned at instantation time:
\begin{scala}
class ScaleIO extends Bundle {
  val in    = new MyFloat().asInput
  val scale = new MyFloat().asInput
  val out   = new MyFloat().asOutput
}
\end{scala}

\noindent
The methods \code{asInput} and \code{asOutput} force all modules of
the data object to the requested direction.

By folding directions into the object declarations, Chisel is able to
provide powerful wiring constructs described later.
%% \begin{scala}
%% class MuxBundle extends Bundle {
%%   val sel = UInt(INPUT,  1)
%%   val in0 = UInt(INPUT,  1)
%%   val in1 = UInt(INPUT,  1)
%%   val out = UInt(OUTPUT, 1)
%% }

%% class Mux2 extends Module {
%%   val io = new MuxBundle()
%%   io.out := (io.sel & io.in1) | (~io.sel & io.in0)
%% }
%% \end{scala}


\section{Modules}

Chisel {\em modules} are very similar to Verilog {\em modules} in
defining a hierarchical structure in the generated circuit.
%Like functional generators, we can also parameterize the construction of
%circuits by turning them into object-oriented modules.  Unlike
%functional generators, modules also provide a coarse hierarchy on a
%circuit and permit a level of generator abstraction that is often
%useful.
The hierarchical module namespace is accessible in downstream tools
to aid in debugging and physical layout.  A user-defined module is
defined as a {\em class} which:
\begin{itemize}
\item inherits from \code{Module},
\item contains an interface stored in a port field named \code{io}, and
\item wires together subcircuits in its constructor.
\end{itemize}
As an example, consider defining your own two-input multiplexer as a
module:
\begin{scala}
class Mux2 extends Module {
  val io = new Bundle{
    val sel = UInt(INPUT, 1)
    val in0 = UInt(INPUT, 1)
    val in1 = UInt(INPUT, 1)
    val out = UInt(OUTPUT, 1)
  }
  io.out := (io.sel & io.in1) | (~io.sel & io.in0)
}
\end{scala}

\noindent
The wiring interface to a module is a collection of ports in the
form of a \code{Bundle}.  The interface to the module is defined
through a field named \code{io}.  For \code{Mux2}, \code{io} is
defined as a bundle with four fields, one for each multiplexer port.

The \code{:=} assignment operator, used here in the body of the
definition, is a special operator in Chisel that wires the input of
left-hand side to the output of the right-hand side.

\subsection{Module Hierarchy}

We can now construct circuit hierarchies, where we build larger modules out
of smaller sub-modules.  For example, we can build a 4-input
multiplexer module in terms of the \code{Mux2} module by wiring
together three 2-input multiplexers:

\begin{scala}
class Mux4 extends Module {
  val io = new Bundle {
    val in0 = UInt(INPUT, 1)
    val in1 = UInt(INPUT, 1)
    val in2 = UInt(INPUT, 1)
    val in3 = UInt(INPUT, 1)
    val sel = UInt(INPUT, 2)
    val out = UInt(OUTPUT, 1)
  }
  val m0 = Module(new Mux2())
  m0.io.sel := io.sel(0) 
  m0.io.in0 := io.in0; m0.io.in1 := io.in1

  val m1 = Module(new Mux2())
  m1.io.sel := io.sel(0) 
  m1.io.in0 := io.in2; m1.io.in1 := io.in3

  val m3 = Module(new Mux2())
  m3.io.sel := io.sel(1) 
  m3.io.in0 := m0.io.out; m3.io.in1 := m1.io.out

  io.out := m3.io.out
}
\end{scala}

\noindent
We again define the module interface as \code{io} and wire up the
inputs and outputs.  In this case, we create three \code{Mux2}
children modules, using the \code{Module} constructor function and 
the Scala \code{new} keyword to create a
new object.  We then wire them up to one another and to the ports of
the \code{Mux4} interface.

\section{Running and Testing Examples}

Now that we have defined modules, we will discuss how we actually run and test a circuit.  Chisel translates into either \verb@C++@ or Verilog.   In order to build a circuit we need to call \code{chiselMain}:

\begin{scala}
object tutorial {
  def main(args: Array[String]) = {
    chiselMain(args, () => Module(new Mux2()))
  }
}
\end{scala}

\begin{figure}
\begin{center}
\includegraphics[width=0.45\textwidth]{../tutorial/figs/DUT.pdf}
\end{center}
\caption{DUT run using a Tester object in Scala with stdin and stdout connected}
\label{fig:dut}
\end{figure}
 
Testing is a crucial part of circuit design, 
and thus in Chisel we provide a mechanism for
testing circuits by providing test vectors within Scala using
subclasses of the \code{Tester} class:

\begin{scala}
class Tester[T <: Module] (val c: T, val isTrace: Boolean = true) {
  var t: Int
  val rnd: Random
  def int(x: Boolean): BigInt
  def int(x: Int): BigInt
  def int(x: Bits): BigInt
  def reset(n: Int = 1)
  def step(n: Int): Int
  def pokeAt(data: Mem[T], index: Int, x: BigInt)
  def poke(data: Bits, x: BigInt)
  def poke(data: Aggregate, x: Array[BigInt])
  def peekAt(data: Mem[T], index: Int)
  def peek(data: Bits): BigInt
  def peek(data: Aggregate): Array[BigInt]
  def expect (good: Boolean, msg: String): Boolean
  def expect (data: Bits, target: BigInt): Boolean
}
\end{scala}

\noindent
which binds a tester to a module
and allows users to write tests using the given debug protocol.  In particular, users utilize:
\begin{itemize}
\item \code{poke} to set input port and state values,
\item \code{step} to execute the circuit one time unit,
\item \code{peek} to read port and state values, and
\item \code{expect} to compare peeked circuit values to expected arguments.
\end{itemize}

\noindent
Users connect tester instances to modules using:


\begin{scala}
object chiselMainTest {
  def apply[T <: Module]
    (args: Array[String], comp: () => T)(
     tester: T => Tester[T]): T
}
\end{scala}

\noindent
When \code{-{-}test} is given as an argument to \code{chiselMainTest}, a
tester instance runs the Design Under Test (DUT) in a separate
process with \code{stdin} and \code{stdout} connected so that debug commands can 
be sent to the DUT and responses can be received from the DUT as shown in
Figure~\ref{fig:dut}.
\noindent

For example, in the following:

\begin{scala}
class Mux2Tests(c: Mux2) extends Tester(c) {
  val n = pow(2, 3).toInt
  for (s <- 0 until 2) {
    for (i0 <- 0 until 2) {
      for (i1 <- 0 until 2) {
        poke(c.io.sel, s)
        poke(c.io.in1, i1)
        poke(c.io.in0, i0)
        step(1)
        expect(c.io.out, (if (s == 1) i1 else i0))
      }
    }
  }
}
\end{scala}

\noindent
assignments for each input of \verb+Mux2+ is set to the appropriate values using \verb+poke+. For this particular example, we are testing the \verb+Mux2+ by hardcoding the inputs to some known values and checking if the output corresponds to the known one. To do this, on each iteration we generate appropriate inputs to the module and tell the simulation to assign these values to the inputs of the device we are testing \verb+c+, step the circuit, and test the expected value.  
\comment{Maybe elaborate a little bit what a step does. I assume it is one clock tick
Is a step needed as well for testing plain combinational circuits?}
Finally, the following shows how the tester is invoked:

\begin{scala}
chiselMainTest(args + "--test", () => Module(new Mux2())){ 
  c => new Mux2Tests(c) 
}
\end{scala}

Other command arguments are as follows:
\begin{tabular}{lll}
\verb+--targetDir+ & target pathname prefix \\
\verb+--genHarness+ & generate harness file for C++ \\
\verb+--backend v+ & generate verilog \\
\verb+--backend c+ & generate C++ (default)\\
\verb+--vcd+ & enable vcd dumping \\
\verb+--debug+ & put all wires in class file \\
\end{tabular}

\comment{What does it mean to generate a harness file?}

\section{State Elements}
\label{sec:sequential}

% SINGLE CLK and RESET

The simplest form of state element supported by Chisel is a
positive edge-triggered register, which can be instantiated
as:
\begin{scala}
val reg = Reg(next = in)
\end{scala}

\noindent
This circuit has an output that is a copy of the input signal \verb+in+
delayed by one clock cycle.  Note that we do not have to specify the
type of \verb+Reg+ as it will be automatically inferred from its input
when instantiated in this way.  In the current version of Chisel,
clock and reset are global signals that are implicity included where
needed.

Using registers, we can quickly define a number of useful circuit
constructs.  For example, a rising-edge detector that takes a boolean
signal in and outputs true when the current value is true and the
previous value is false is given by:
\begin{scala}
def risingedge(x: Bool) = x && !Reg(next = x)
\end{scala}

Counters are an important sequential circuit.  To construct an
up-counter that counts up to a maximum value, \verb+max+, then wraps
around back to zero (i.e., modulo \verb!max+1!), we write:
\begin{scala}
def counter(max: UInt) = {
  val x = Reg(init = UInt(0, max.getWidth))
  x := Mux(x === max, UInt(0), x + UInt(1))
  x
}
\end{scala}

\noindent
The counter register is created in the \verb!counter! function 
with a reset value of \verb!0! (with width large enough to hold \verb+max+),
to which the register will be initialized when the global reset for the circuit is asserted.
The \verb!:=! assignment to \verb!x! in \verb!counter! wires an update combinational circuit 
which increments the counter value unless it hits the \verb+max+ at which point it wraps back to zero.
Note that when \verb!x! appears on the right-hand side of
an assigment, its output is referenced, whereas when on the left-hand
side, its input is referenced.

Counters can be used to build a number of useful sequential circuits.
For example, we can build a pulse generator by outputting true when
a counter reaches zero:
\begin{scala}
// Produce pulse every n cycles.
def pulse(n: UInt) = counter(n - UInt(1)) === UInt(0)
\end{scala}

\noindent
A square-wave generator can then be toggled by the pulse train,
toggling between true and false on each pulse:
\begin{scala}[escapechar=@]
// Flip internal state when input true.
def toggle(p: Bool) = {
  val x = Reg(init = Bool(false))
  x := Mux(p, !x, x)
  x
}

// Square wave of a given period.
def squareWave(period: UInt) = toggle(pulse(period/2))
\end{scala}

\subsection{Forward Declarations}

Purely combinational circuits cannot have cycles between nodes, and
Chisel will report an error if such a cycle is detected.  Because they
do not have cycles, combinational circuits can always be constructed
in a feed-forward manner, by adding new nodes whose inputs are derived
from nodes that have already been defined.  Sequential circuits
naturally have feedback between nodes, and so it is sometimes
necessary to reference an output wire before the producing node has
been defined.  Because Scala evaluates program statements
sequentially, we allow data nodes to serve as a wire providing
a declaration of a node that can be used immediately, but whose
input will be set later.  
For example, in a simple CPU, we need to define the \verb!pcPlus4!
and \verb!brTarget! wires so they can be referenced before defined:
\begin{scala}
val pcPlus4  = UInt()
val brTarget = UInt()
val pcNext   = Mux(io.ctrl.pcSel, brTarget, pcPlus4)
val pcReg    = Reg(next = pcNext, init = UInt(0, 32))
pcPlus4     := pcReg + UInt(4)
...
brTarget    := addOut
\end{scala}

\noindent
The wiring operator
\verb!:=! is used to wire up
the connection after \verb!pcReg! and \verb!addOut! are defined.

\subsection{Conditional Updates}

In our previous examples using registers, we simply wired the
combinational logic blocks to the inputs of the registers.
When describing the operation of state
elements, it is often useful to instead specify when updates to the
registers will occur and to specify these updates spread across
several separate statements.  Chisel provides conditional update rules
in the form of the \code{when} construct to support this style of
sequential logic description. For example,
% MS: the following is not working (anymore), it results in a single bit register
% with the incrementing code.
% val r = Reg(UInt(16))
% MS: is there any meaning having a single, unnamed parameter for Reg()?
% MS: should this be allowed?
\begin{scala}
val r = Reg(init = UInt(0, 16))
when (cond) {
  r := r + UInt(1)
}
\end{scala}

\noindent
where register \code{r} is updated at the end of the current clock
cycle only if \verb+cond+ is \code{true}.  The argument to \code{when} is a
predicate circuit expression that returns a \code{Bool}.  The update
block following \code{when} can only contain update statements using
the assignment operator \verb+:=+, simple expressions, and named wires
defined with \code{val}.

In a sequence of conditional updates, the last conditional update
whose condition is true takes priority.  For example,
\begin{scala}
when (c1) { r := UInt(1) }
when (c2) { r := UInt(2) }
\end{scala}

\noindent
leads to \code{r} being updated according to the following truth table:
\begin{center}
{\small
\begin{tabular}{|c|c|c|l|}
\hline
c1 & c2  &  r & \\
\hline
0 &  0 & r &  r unchanged \\
0 &  1 & 2 & \\
1 &  0 & 1 & \\
1 &  1 & 2& c2 takes precedence over c1 \\
\hline
\end{tabular}
}
\end{center}

\begin{figure}[h]
\centering
\includegraphics[width=3in]{figs/condupdates.pdf}
\caption{Equivalent hardware constructed for conditional updates.
  Each \code{when} statement adds another level of data mux and ORs
  the predicate into the enable chain.  The compiler effectively adds
  the termination values to the end of the chain automatically.}
\label{fig:condupdates}
\end{figure}

Figure~\ref{fig:condupdates} shows how each conditional update can be
viewed as inserting a mux before the input of a register to select
either the update expression or the previous input according to the
\code{when} predicate.  In addition, the predicate is OR-ed into an
enable signal that drives the load enable of the register.  The
compiler places initialization values at the beginning of the chain so
that if no conditional updates fire in a clock cycle, the load enable
of the register will be deasserted and the register value will not
change.
 
Chisel provides some syntactic sugar for other common forms of
conditional update.  The \verb+unless+ construct is the same as
\verb+when+ but negates its condition.  In other words,
\begin{scala}
unless (c) { body }
\end{scala}
is the same as
\begin{scala}
when (!c) { body }
\end{scala}

% The \verb+otherwise+ construct is the same as \verb+when+ with a true
% condition.  In other words,
% \begin{scala}
% otherwise { body }
% \end{scala}
% 
% \noindent
% is the same as
% \begin{scala}
% when (Bool(true)) { body }
% \end{scala}

The update block can target multiple registers, and there can be
different overlapping subsets of registers present in different update
blocks.  Each register is only affected by conditions in which it
appears.  The same is possible for combinational circuits (update
of a \code{Wire}). Note that all combinational
circuits need a default value. For example:
\begin{scala}
r := SInt(3); s := SInt(3)
when (c1)   { r := SInt(1); s := SInt(1) }
when (c2)   { r := SInt(2) }
\end{scala}

\noindent
leads to \code{r} and \code{s} being updated according to the
following truth table:
\begin{scala}
c1 c2  r  s
0   0  3  3
0   1  2  3 // r updated in c2 block, s at top level.
1   0  1  1
1   1  2  1
\end{scala}

\begin{commentary}
We are considering adding a different form of conditional update,
where only a single update block will take effect.  These atomic
updates are similar to Bluespec guarded atomic actions.
% TODO: when / .elsewhen / .otherwise
\end{commentary}

Conditional update constructs can be nested and any given block is
executed under the conjunction of all outer nesting conditions.  For
example,
\begin{scala}
when (a) { when (b) { body } }
\end{scala}

\noindent
is the same as:
\begin{scala}
when (a && b) { body }
\end{scala}

Conditionals can be chained together using
\verb+when+, \verb+.elsewhen+, \verb+.otherwise+ corresponding to
\verb+if+, \verb+else if+ and \verb+else+ in Scala.  For example,
\begin{scala}
when (c1) { u1 }
.elsewhen (c2) { u2 }
.otherwise { ud }
\end{scala}
\noindent
is the same as:
\begin{scala}
when (c1) { u1 }
when (!c1 && c2) { u2 }
when (!(c1 || c2)) { ud }
\end{scala}

We introduce the \code{switch} statement for conditional updates
involving a series of comparisons against a common key.  For example,
\begin{scala}
switch(idx) {
 is(v1) { u1 }
 is(v2) { u2 }
}
\end{scala}

\noindent
is equivalent to:
\begin{scala}
when (idx === v1) { u1 }
.elsewhen (idx === v2) { u2 }
\end{scala}

Chisel also allows a \code{Wire}, i.e., the output of some
combinational logic, to be the target of conditional update statements
to allow complex combinational logic expressions to be built
incrementally.  Chisel does not allow a combinational output to be
incompletely specified and will report an error if an unconditional
update is not encountered for a combinational output.
\begin{commentary}
In Verilog, if a procedural specification of a combinational logic
block is incomplete, a latch will silently be inferred causing many
frustrating bugs.

It could be possible to add more analysis to the Chisel compiler, to
determine if a set of predicates covers all possibilities.  But for
now, we require a single predicate that is always true in the
chain of conditional updates to a \code{Wire}.
\end{commentary}


\subsection{Finite State Machines}

A common type of sequential circuit used in digital design is a Finite
State Machine (FSM).  An example of a simple FSM is a parity
generator:

% \begin{scala}
% class Parity extends Module {
%   val io = new Bundle {
%     val in  = Bool(INPUT)
%     val out = Bool(OUTPUT) }
%   val s_even :: s_odd :: Nil = Enum(2, UInt())
%   val state  = Reg(init = s_even)
%   switch(state, Array(
%     (s_even, () => { when (io.in) { state := s_odd } }),
%     (s_odd,  () => { when (io.in) { state := s_even } }) ))
%   io.out := state === s_odd
% }
% \end{scala}

\begin{scala}
class Parity extends Module {
  val io = new Bundle {
    val in  = Bool(dir = INPUT)
    val out = Bool(dir = OUTPUT) }
  val s_even :: s_odd :: Nil = Enum(UInt(), 2)
  val state  = Reg(init = s_even)
  when (io.in) {
    when (state === s_even) { state := s_odd  }
    when (state === s_odd)  { state := s_even }
  }
  io.out := (state === s_odd)
}
\end{scala}

\noindent
where \verb+Enum(UInt(), 2)+ generates two \verb+UInt+ literals.
States are updated when \verb+in+ is true.  It is worth
noting that all of the mechanisms for FSMs are built upon registers,
wires, and conditional updates.

Below is a more complicated FSM example which is a circuit for
accepting money for a vending machine:
\begin{scala}
class VendingMachine extends Module {
  val io = new Bundle {
    val nickel = Bool(dir = INPUT)
    val dime   = Bool(dir = INPUT)
    val valid    = Bool(dir = OUTPUT) }
  val s_idle :: s_5 :: s_10 :: s_15 :: s_ok :: Nil = 
    Enum(UInt(), 5)
  val state = Reg(init = s_idle)
  when (state === s_idle) {
    when (io.nickel) { state := s_5 }
    when (io.dime)   { state := s_10 }
  }
  when (state === s_5) {
    when (io.nickel) { state := s_10 }
    when (io.dime)   { state := s_15 }
  }
  when (state === s_10) {
    when (io.nickel) { state := s_15 }
    when (io.dime)   { state := s_ok }
  }
  when (state === s_15) {
    when (io.nickel) { state := s_ok }
    when (io.dime)   { state := s_ok }
  }
  when (state === s_ok) {
    state := s_idle
  }
  io.valid := (state === s_ok)
}
\end{scala}

\noindent
Here is the vending machine FSM defined with \code{switch} statement:
\begin{scala}
class VendingMachine extends Module {
  val io = new Bundle {
    val nickel = Bool(dir = INPUT)
    val dime   = Bool(dir = INPUT)
    val valid    = Bool(dir = OUTPUT)
  }
  val s_idle :: s_5 :: s_10 :: s_15 :: s_ok :: Nil = Enum(UInt(), 5)
  val state = Reg(init = s_idle)
  
  switch (state) {
    is (s_idle) {
      when (io.nickel) { state := s_5 }
      when (io.dime) { state := s_10 }
    }
    is (s_5) {
      when (io.nickel) { state := s_10 }
      when (io.dime) { state := s_15 }
    }
    is (s_10) {
      when (io.nickel) { state := s_15 }
      when (io.dime) { state := s_ok }
    }
    is (s_15) {
      when (io.nickel) { state := s_ok }
      when (io.dime) { state := s_ok }
    }
    is (s_ok) {
      state := s_idle
    }
  }
  io.valid := (state ===s_ok)
}
\end{scala}

\section{Memories}

Chisel provides facilities for creating both read only and
read/write memories.  

\subsection{ROM}

Users can define read only memories with a \code{Vec}:

\begin{scala}
Vec(inits: Seq[T])
Vec(elt0: T, elts: T*)
\end{scala}

\noindent
where \verb+inits+ is a sequence of initial \verb+Data+ literals that
initialize the ROM.
For example,  users can
create a small ROM initialized to \verb+1, 2, 4, 8+ and 
loop through all values using a counter as an address generator as follows:

\begin{scala}
val m = Vec(Array(UInt(1), UInt(2), UInt(4), UInt(8)))
val r = m(counter(UInt(m.length)))
\end{scala}

\noindent
We can create an \verb+n+ value sine lookup table using a ROM initialized as follows:

\begin{scala}
def sinTable (amp: Double, n: Int) = {
  val times = 
    Range(0, n, 1).map(i => (i*2*Pi)/(n.toDouble-1) - Pi)
  val inits = 
    times.map(t => SInt(round(amp * sin(t)), width = 32))
  Vec(inits)
}
def sinWave (amp: Double, n: Int) = 
  sinTable(amp, n)(counter(UInt(n))
\end{scala}

\noindent
where \verb+amp+ is used to scale the fixpoint values stored in the ROM.

\subsection{Mem}

Memories are given special treatment in Chisel since hardware
implementations of memory have many variations, e.g., FPGA memories
are instantiated quite differently from ASIC memories.  Chisel defines
a memory abstraction that can map to either simple Verilog behavioral
descriptions, or to instances of memory modules that are available
from external memory generators provided by foundry or IP vendors.  

Chisel supports random-access memories via the \code{Mem} construct.
Writes to Mems are positive-edge-triggered and reads are either
combinational or positive-edge-triggered.\footnote{Current FPGA technology
does not support combinational (asynchronous) reads (anymore). The read address
needs to be registered.}


\begin{scala}
object Mem {
  def apply[T <: Data](type: T, depth: Int, 
          seqRead: Boolean = false): Mem
}

class Mem[T <: Data](type: T, depth: Int,
      seqRead: Boolean = false)
    extends Updateable {
  def apply(idx: UInt): T
}
\end{scala}

Ports into Mems are created by applying a \code{UInt} index.  A 32-entry
register file with one write port and two combinational read ports might be
expressed as follows:

\begin{scala}
val rf = Mem(UInt(width = 64), 32)
when (wen) { rf(waddr) := wdata }
val dout1 = rf(waddr1)
val dout2 = rf(waddr2)
\end{scala}

If the optional parameter \code{seqRead} is set, Chisel will attempt to infer
sequential read ports when the read address is a \code{Reg}.  A one-read port,
one-write port SRAM might be described as follows:

\begin{scala}
val ram1r1w =
  Mem(UInt(width = 32), 1024, seqRead = true)
val reg_raddr = Reg(UInt())
when (wen) { ram1r1w(waddr) := wdata }
when (ren) { reg_raddr := raddr }
val rdata = ram1r1w(reg_raddr)
\end{scala}

Single-ported SRAMs can be inferred when the read and write conditions are
mutually exclusive in the same \code{when} chain:

\begin{scala}
val ram1p = 
  Mem(UInt(width = 32), 1024, seqRead = true)
val reg_addr = Reg(UInt())
when (wen) { ram1p(addr) := wdata }
.elsewhen (ren) { reg_addr := addr }
val rdata = ram1p(reg_addr)
\end{scala}

If the same \code{Mem} address is both written and sequentially read on the same clock
edge, or if a sequential read enable is cleared, then the read data is
undefined.

\code{Mem} also supports write masks for subword writes.  A given bit is written if
the corresponding mask bit is set.

\begin{scala}
val ram = Mem(UInt(width = 32), 256)
when (wen) { ram.write(waddr, wdata, wmask) }
\end{scala}


% For example, an
% audio recorder could be defined as follows:
% 
% \begin{scala}
%   def audioRecorder(n: Int, button: Bool) = { 
%     val addr   = counter(UInt(n))
%     val ram    = Mem(n)
%     ram(addr) := button
%     ram(Mux(button(), UInt(0), addr))
%   } 
% \end{scala}
% 
% \noindent
% where a counter is used as an address generator into a memory.  
% The device records while \verb+button+ is \verb+true+, or plays back when \verb+false+.


\section{Interfaces and Bulk Connections}
\label{sec:interfaces}

For more sophisticated modules it is often useful to define and
instantiate interface classes while defining the IO for a module.  First and
foremost, interface classes promote reuse allowing users to capture
once and for all common interfaces in a useful form.  Secondly,
interfaces allow users to dramatically reduce wiring by supporting
{\em bulk connections} between producer and consumer modules.  Finally,
users can make changes in large interfaces in one place reducing the
number of updates required when adding or removing pieces of the
interface.

\subsection{Port Classes, Subclasses, and Nesting}

As we saw earlier, users can define their own interfaces by defining a class that subclasses \verb+Bundle+.  
For example, a user could define a simple link for handshaking data as follows:

\begin{scala}
class SimpleLink extends Bundle { 
  val data  = UInt(16, OUTPUT) 
  val valid = Bool(OUTPUT)
}
\end{scala}

\noindent
We can then extend \verb+SimpleLink+ by adding parity bits using
bundle inheritance:

\begin{scala}
class PLink extends SimpleLink { 
  val parity = UInt(5, OUTPUT) 
}
\end{scala}

\noindent
In general, users can organize their interfaces into hierarchies using inheritance.  

From there we can define a filter interface by nesting two
\verb+PLink+s into a new \verb+FilterIO+ bundle:

\begin{scala}
class FilterIO extends Bundle { 
  val x = new PLink().flip
  val y = new PLink()
}
\end{scala}

\noindent
where \verb+flip+ recursively changes the ``gender'' of a bundle,
changing input to output and output to input.

We can now define a filter by defining a filter class extending module:

\begin{scala}
class Filter extends Module { 
  val io = new FilterIO()
  ...
}
\end{scala}

\noindent 
where the \verb+io+ field contains \verb+FilterIO+. 

\subsection{Bundle Vectors}

Beyond single elements, vectors of elements form richer hierarchical interfaces.  
For example, in order to create a crossbar with a vector of inputs, producing a vector of outputs, and selected by a UInt input, 
we utilize the \verb+Vec+ constructor:

\begin{scala}
class CrossbarIo(n: Int) extends Bundle {
  val in  = Vec.fill(n){ new PLink().flip() }
  val sel = UInt(INPUT, sizeof(n))
  val out = Vec.fill(n){ new PLink() }
}
\end{scala}

% \begin{scala}
% class CrossbarIo(n: Int) extends Bundle {
%   val in  = Vec.fill(n){ UInt(w, INPUT) }
%   val sel = Vec.fill(n){ UInt(sizeof(n),  INPUT) }
%   val out = Vec.fill(n){ UInt(w, OUTPUT) }
% }
% \end{scala}

\noindent
where \verb+Vec+ takes a size as the first argument and a block returning a port as the second argument.

\subsection{Bulk Connections}

We can now compose two filters into a filter block as follows:

\begin{scala}
class Block extends Module { 
  val io = new FilterIO()
  val f1 = Module(new Filter())
  val f2 = Module(new Filter())

  f1.io.x <> io.x
  f1.io.y <> f2.io.x
  f2.io.y <> io.y
}
\end{scala}

\noindent
where \verb+<>+ bulk connects interfaces of opposite gender between
sibling modules or interfaces of same gender between parent/child modules. 
Bulk connections connect leaf ports of the same name to each other.
After all connections are made and the circuit is being elaborated,
Chisel warns users if ports have other than exactly one connection to them.

\subsection{Interface Views}

\begin{figure}
\centerline{\includegraphics[width=3in]{figs/cpu.png}}
\caption{Simple CPU involving control and data path submodules and host and memory interfaces.}
\label{fig:cpu}
\end{figure}

Consider a simple CPU consisting of control path and data path submodules and host and memory interfaces shown in Figure~\ref{fig:cpu}.
In this CPU we can see that the control path and data path each connect only to a part of the instruction and data memory interfaces. 
Chisel allows users to do this with partial fulfillment of interfaces.
A user first defines the complete interface to a ROM and Mem as follows:

\begin{scala}
class RomIo extends Bundle {
  val isVal = Bool(INPUT)
  val raddr = UInt(INPUT, 32)
  val rdata = UInt(OUTPUT, 32)
}

class RamIo extends RomIo {
  val isWr  = Bool(INPUT)
  val wdata = UInt(INPUT, 32)
}
\end{scala}

\noindent
Now the control path can build an interface in terms of these interfaces:

\begin{scala}
class CpathIo extends Bundle {
  val imem = RomIo().flip()
  val dmem = RamIo().flip()
  ...
}
\end{scala}

\noindent
and the control and data path modules can be built by partially assigning to
this interfaces as follows:

\begin{scala}
class Cpath extends Module {
  val io = new CpathIo();
  ...
  io.imem.isVal := ...;
  io.dmem.isVal := ...;
  io.dmem.isWr  := ...;
  ...
}

class Dpath extends Module {
  val io = new DpathIo();
  ...
  io.imem.raddr := ...;
  io.dmem.raddr := ...;
  io.dmem.wdata := ...;
  ...
}
\end{scala}

\noindent
We can now wire up the CPU using bulk connects as we would with other bundles:

\begin{scala}
class Cpu extends Module {
  val io = new CpuIo()
  val c  = Module(new CtlPath())
  val d  = Module(new DatPath())
  c.io.ctl  <> d.io.ctl
  c.io.dat  <> d.io.dat
  c.io.imem <> io.imem
  d.io.imem <> io.imem
  c.io.dmem <> io.dmem
  d.io.dmem <> io.dmem
  d.io.host <> io.host
}
\end{scala}

\noindent
Repeated bulk connections of partially assigned control and data path interfaces
completely connect up the CPU interface.

% A Bool can be automatically treated as a single bit UInt (with values
% 0 or 1), but an Int or UInt cannot be used as a Bool without an
% explicit cast.
%
%   Lit(5) // means a constant node with decimal value 5. Bit width will
%           // be inferred automatically if possible
%
% A node is a hardware operator that has zero or more inputs and that
% drives one output.  An example of a node with zero inputs is a
% constant generator.
%
% \begin{scala}
% Lit(10, 4) // means a constant node of type UInt that is 4 bits
%            // wide with decimal 10.
% Lit(10)
% LitInt(10, 4)
% LitUInt(10, 4)
% Lit(-1,4)
% \end{scala}
%
% can more concisely write:
%
% Module correspond to Verilog modules
% Cell is a sub-module, Chisel Module

\section{Functional Creation of Modules}
\label{sec:funconstructor}

It is also useful to be able to make a functional interface for
module construction.  For instance, we could build a constructor
that takes multiplexer inputs as parameters and returns the
multiplexer output:

\begin{scala}
object Mux2 {
  def apply (sel: UInt, in0: UInt, in1: UInt) = {
    val m = new Mux2()
    m.io.in0 := in0
    m.io.in1 := in1
    m.io.sel := sel
    m.io.out
  }
}
\end{scala}

\noindent
where \code{object Mux2} creates a Scala singleton object on the \code{Mux2}
module class, and \code{apply} defines a method for creation of a \code{Mux2} instance.
%
With this \code{Mux2} creation function, the specification of \code{Mux4} now is
significantly simpler.

\begin{scala}
class Mux4 extends Module {
  val io = new Bundle {
    val in0 = UInt(INPUT, 1)
    val in1 = UInt(INPUT, 1)
    val in2 = UInt(INPUT, 1)
    val in3 = UInt(INPUT, 1)
    val sel = UInt(INPUT, 2)
    val out = UInt(OUTPUT, 1)
  }
  io.out := Mux2(io.sel(1),
                 Mux2(io.sel(0), io.in0, io.in1),
                 Mux2(io.sel(0), io.in2, io.in3))
}
\end{scala}

Selecting inputs is so useful that Chisel builds it in and calls it
\code{Mux}.  However, unlike \code{Mux2} defined above, the builtin version allows any datatype on
\code{in0} and \code{in1} as long as they have a common super class.
In Section~\ref{sec:parameterization} we will see how to define this
ourselves.

Chisel provides \code{MuxCase} which is an n-way \code{Mux} 
\begin{scala}
MuxCase(default, Array(c1 -> a, c2 -> b, ...))
\end{scala}
 
\noindent
where each condition / value is represented as a tuple in a Scala
array and where \code{MuxCase} can be translated into the following
\code{Mux} expression:

\begin{scala}
Mux(c1, a, Mux(c2, b, Mux(..., default)))
\end{scala}

\noindent
Chisel also provides \code{MuxLookup} which is an n-way indexed multiplexer:

\begin{scala}
MuxLookup(idx, default, 
          Array(UInt(0) -> a, UInt(1) -> b, ...))
\end{scala}

\noindent
which can be rewritten in terms of:\verb+MuxCase+ as follows:

\begin{scala}
MuxCase(default, 
        Array((idx === UInt(0)) -> a, 
              (idx === UInt(1)) -> b, ...))
\end{scala}

\noindent
Note that the cases (eg. c1, c2) must be in parentheses.

% TODO: higher order filter

% \Noindent
% where the overall expression returns the value corresponding to the first condition evaluating to true.

% FUNCTIONAL CREATION
%
% want to go from io to constructor
%
% \begin{scala}
% val io = new Bundle{
%   val sel = UInt(INPUT, 1)
%   val in0 = UInt(INPUT, 1)
%   val in1 = UInt(INPUT, 1)
%   val out = UInt(OUTPUT, 1)
% }
% def Mux2(sel: UInt, in0: UInt, in0: UInt): UInt = {
%   val m = new Mux2()
%   m.io.wire(Array("sel" => sel, "in0" => in0, "in1" => in1), "out")
% }
% \end{scala}

% picture of box in box



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Polymorphism and Parameterization}
\label{sec:parameterization}

Scala is a strongly typed language and uses parameterized types to specify generic functions and classes.  
In this section, we show how Chisel users can define their own reusable functions and classes using parameterized classes.
\begin{commentary}
This section is advanced and can be skipped at first reading.
\end{commentary}

\subsection{Parameterized Functions}

Earlier we defined \code{Mux2} on \code{Bool}, but now we show how we can define a generic multiplexer function.
We define this function as taking a boolean condition and con and alt arguments (corresponding to then and else expressions) of type \code{T}:

\begin{scala}
def Mux[T <: Bits](c: Bool, con: T, alt: T): T { ... }
\end{scala}

\noindent
where \code{T} is required to be a subclass of \code{Bits}.
Scala ensures that in each usage of \code{Mux}, it can find a common superclass of the actual con and alt argument types, 
otherwise it causes a Scala compilation type error.
For example,

\begin{scala}
Mux(c, UInt(10), UInt(11))
\end{scala}

\noindent
yields a \code{UInt} wire because the \code{con} and \code{alt} arguments are each of type \code{UInt}.

% Earlier we defined \code{Mux2} on \code{Bool}, but now we show how we can define a generic \code{Mux}.  
% We define a function that takes a condition and two functions of no arguments (called thunks) for the {\it then} and {\it else} cases:
% 
% \begin{scala}
% def Mux[T <: UInt](c: Bool, con: T, alt: T): T
% def Mux[T <: UInt](c: Bool)(con: => T)(alt: => T): T
% \end{scala}
% 
% \noindent
% where the two thunk return types are parameterized to be a type \code{T} that is a subclass of \code{UInt}.
% Scala ensures that it finds a common superclass of the two thunks' return types.

We now present a more advanced example of parameterized functions for defining an inner product FIR digital filter generically over Chisel \code{Num}'s.
The inner product FIR filter can be mathematically defined as:
\begin{equation}
y[t] = \sum_j w_j * x_j[t-j]
\end{equation}

\noindent 
where $x$ is the input and $w$ is a vector of weights.
In Chisel this can be defined as:

% MS: just out of curiosity: does this example generate several delay lines?
\begin{scala}
def delays[T <: Data](x: T, n: Int): List[T] = 
  if (n <= 1) List(x) else x :: Delays(RegNext(x), n-1)

def FIR[T <: Data with Num[T]](ws: Seq[T], x: T): T = 
  (ws, Delays(x, ws.length)).zipped.
    map( _ * _ ).reduce( _ + _ )
\end{scala}
 
\noindent
where 
\code{delays} creates a list of incrementally increasing delays of its input and
\code{reduce} constructs a reduction circuit given a binary combiner function \code{f}.  
In this case, \code{reduce} creates a summation circuit.
Finally, the \code{FIR} function is constrained to work on inputs of type \code{Num} where Chisel multiplication and addition are defined.

\subsection{Parameterized Classes}

Like parameterized functions, we can also parameterize classes to make them more reusable.
For instance, we can generalize the Filter class to use any kind of link.  
We do so by parameterizing the \verb+FilterIO+ class and defining the constructor to take a zero argument type constructor function as follow:

\begin{scala}
class FilterIO[T <: Data](type: T) extends Bundle { 
  val x = type.asInput.flip
  val y = type.asOutput
}
\end{scala}

\noindent
We can now define \verb+Filter+ by defining a module class that also takes a link type constructor argument and passes it through to the \verb+FilterIO+ interface constructor:

\begin{scala}
class Filter[T <: Data](type: T) extends Module { 
  val io = new FilterIO(type)
  ...
}
\end{scala}

\noindent
We can now define a \verb+PLink+ based \verb+Filter+ as follows:
\begin{scala}
val f = Module(new Filter(new PLink()))
\end{scala}

\noindent
where the curly braces \verb+{ }+ denote a zero argument function (aka thunk) that in this case creates the link type.

A generic FIFO could be defined as shown in Figure~\ref{fig:fifo} and
used as follows:

\begin{scala}
class DataBundle extends Bundle {
  val A = UInt(width = 32)
  val B = UInt(width = 32)
}

object FifoDemo {
  def apply () = new Fifo(new DataBundle, 32)
}
\end{scala}

\begin{figure}[ht]
\begin{scala}
class Fifo[T <: Data] (type: T, n: Int) 
    extends Module {
  val io = new Bundle {
    val enq_val = Bool(INPUT)
    val enq_rdy = Bool(OUTPUT)
    val deq_val = Bool(OUTPUT)
    val deq_rdy = Bool(INPUT)
    val enq_dat = type.asInput
    val deq_dat = type.asOutput
  }
  val enq_ptr      = Reg(init = UInt(0, sizeof(n)))
  val deq_ptr      = Reg(init = UInt(0, sizeof(n)))
  val is_full      = Reg(init = Bool(false))
  val do_enq       = io.enq_rdy && io.enq_val
  val do_deq       = io.deq_rdy && io.deq_val
  val is_empty     = !is_full && (enq_ptr === deq_ptr)
  val deq_ptr_inc  = deq_ptr + UInt(1)
  val enq_ptr_inc  = enq_ptr + UInt(1)
  val is_full_next = 
    Mux(do_enq && ~do_deq && (enq_ptr_inc === deq_ptr), 
        Bool(true),
        Mux(do_deq && is_full, Bool(false), is_full))
  enq_ptr := Mux(do_enq, enq_ptr_inc, enq_ptr)
  deq_ptr := Mux(do_deq, deq_ptr_inc, deq_ptr)
  is_full := is_full_next
  val ram = Mem(n)
  when (do_enq) {
    ram(enq_ptr) := io.enq_dat
  }
  io.enq_rdy := !is_full
  io.deq_val := !is_empty
  ram(deq_ptr) <> io.deq_dat
}
\end{scala}
\caption{Parameterized FIFO example.}
\label{fig:fifo}
\end{figure}

It is also possible to define a generic decoupled interface:

\begin{scala}
class DecoupledIO[T <: Data](data: T) 
    extends Bundle {
  val ready = Bool(INPUT)
  val valid = Bool(OUTPUT)
  val bits  = data.clone.asOutput
}
\end{scala}

\noindent
This template can then be used to add a handshaking protocol to any
set of signals:

\begin{scala}
class DecoupledDemo 
  extends DecoupledIO()( new DataBundle )
\end{scala}

\noindent
The FIFO interface in Figure~\ref{fig:fifo} can be now be simplified as
follows: 

\begin{scala}
class Fifo[T <: Data] (data: T, n: Int) 
    extends Module {
  val io = new Bundle {
    val enq = new DecoupledIO( data ).flip()
    val deq = new DecoupledIO( data )
  }
  ...
}
\end{scala}


\section{Multiple Clock Domains}

Chisel 2.0 introduces support of multiple clock domains.  

\subsection{Creating Clock domains}

In order to use multiple clock domains, users must create multiple clocks.  
In Chisel, clocks are first class nodes created with a reset signal parameter and defined as follows:

\begin{scala}
class Clock (reset: Bool) extends Node {
  def reset: Bool // returns reset pin
}
\end{scala}

\noindent
% Having reset in clock makes it easier to pass around.
In Chisel there is a builtin implicit clock that state elements use by default:

\begin{scala}
var implicitClock = new Clock( implicitReset )
\end{scala}

The clock for state elements and modules can be defined using an additional named parameter called clock:

\begin{scala}
Reg(... clock: Clock = implicitClock)
Mem(... clock: Clock = implicitClock)
Module(... clock: Clock = implicitClock)
\end{scala}

\subsection{Crossing Clock Domains}

There are two ways that circuits can be defined to send data between clock domains.
The first and most primitive way is by using a synchronizer circuit comprised of two registers as follows:

\begin{scala}
// signalA is in clock domain clockA, 
// want a version in clockB as signalB
val s1 = Reg(init = UInt(0), clock = clockB)
val s2 = Reg(init = UInt(0), clock = clockB)
s1      := signalA
s2      := s1; 
signalB := s2
\end{scala}

\noindent
Due to metastability issues, this technique is limited to communicating one bit data between domains.

The second and more general way to send data between domains is by using an asynchronous fifo:

\begin{scala}
class AsyncFifo[T<:Data]
    (gen: T, entries: Int, enq_clk: Clock, deq_clock: Clock)
  extends Module
\end{scala}

\noindent
When get a version of signalA from clock domains clockA to clockB by
specifying the standard fifo parameters and the two clocks and then using the
standard decoupled ready/valid signals:

\begin{scala}
val fifo =
  new AsyncFifo(Uint(width = 32), 2, clockA, clockB)
fifo.io.enq.bits  := signalA
signalB           := fifo.io.deq.bits
fifo.io.enq.valid := condA
fifo.io.deq.ready := condB
...
\end{scala}

\subsection{Backend Specific Multiple Clock Domains}

Each Chisel backend requires the user to setup and control multiple clocks
in a backend specific manner.  For the purposes of showing how to drive a
multi clock design, consider the example of hardware with two modules
communicating using an AsyncFifo with each module on separate clocks:
\verb+fastClock+ and \verb+slowClock+.

\subsubsection{C++}

%MS: looks like the generated clock code is now a little bit different.
In the C++ backend, for every clock \verb+i+ there is a
\begin{itemize}
\item \verb+size_t clk+.len field representing the clock \verb+i+'s period,
\item \verb+clock_lo_i+ and \verb+clock_hi_i+, 
\item \verb+int reset()+ function which ensures that all \verb+clock_lo+ and \verb+clock_hi+ functions are called at least once, and
\item \verb+int clock(reset)+ function which computes min delta, invokes appropriate \verb+clock_lo+ and \verb+clock_hi+'s and returns min delta used.
\end{itemize}

\noindent
In order to set up a C++ simulation, the user 
\begin{itemize}
\item initializes all period fields to desired period
\item initializes all count fields to desired phase, 
\item calls \verb+reset+ and then
\item repeated calls clock to step the simulation.
\end{itemize}

\noindent
The following is a C++ example of a main function for the \verb+slowClock+ / \verb+fastClock+ example:

\begin{scala}
int main(int argc, char** argv) {
  ClkDomainTest_t dut;
  dut.init(1);
  dut.clk = 2;
  dut.clk_cnt = 1;
  dut.fastClock = 4;
  dut.fastClock_cnt = 0;
  dut.slowClock = 6;
  dut.slowClock_cnt = 0;
  for (int i = 0; i < 12; i ++) 
    dut.reset();
  for (int i = 0; i < 96; i ++) 
    dut.clock(LIT<1>(0));
}
\end{scala}

\subsubsection{Verilog}

In Verilog, 

\begin{itemize}
\item Chisel creates a new port for each clock / reset,
\item Chisel wires all the clocks to the top module, and
\item the user must create an \verb+always+ block clock driver for every clock \verb+i+.
\end{itemize}

\noindent
The following is a Verilog example of a top level harness to drive the  \verb+slowClock+ / \verb+fastClock+ example circuit:

\begin{scala}
module emulator;
  reg fastClock = 0, slowClock = 0, 
      resetFast = 1, resetSlow = 1;
  wire [31:0] add, mul, test;
  always #2 fastClock = ~fastClock;
  always #4 slowClock = ~slowClock;
  initial begin
     #8 
     resetFast = 0;
     resetSlow = 0;
     #400
     $finish;
  end
  ClkDomainTest dut (
     .fastClock(fastClock),
     .slowClock(slowClock),
     .io_resetFast(resetFast),
     .io_resetSlow(resetSlow),
     .io_add(add), .io_mul(mul), .io_test(test));
endmodule
\end{scala}

\noindent
See \url{http://www.asic-world.com/verilog/verifaq2.html} for more information about simulating clocks in Verilog.

\section{Acknowlegements}

Many people have helped out in the design of Chisel, and we thank them
for their patience, bravery, and belief in a better way.  Many
Berkeley EECS students in the Isis group gave weekly feedback as the
design evolved including but not limited to Yunsup Lee, Andrew
Waterman, Scott Beamer, Chris Celio, etc.  Yunsup Lee gave us feedback
in response to the first RISC-V implementation, called TrainWreck,
translated from Verilog to Chisel.  Andrew Waterman and Yunsup Lee
helped us get our Verilog backend up and running and Chisel TrainWreck
running on an FPGA.  Brian Richards was the first actual Chisel user,
first translating (with Huy Vo) John Hauser's FPU Verilog code to
Chisel, and later implementing generic memory blocks.  Brian gave many
invaluable comments on the design and brought a vast experience in
hardware design and design tools.  Chris Batten shared his fast
multiword C++ template library that inspired our fast emulation
library.  Huy Vo became our undergraduate research assistant and was
the first to actually assist in the Chisel implementation.  We
appreciate all the EECS students who participated in the Chisel
bootcamp and proposed and worked on hardware design projects all of
which pushed the Chisel envelope.  We appreciate the work that James
Martin and Alex Williams did in writing and translating network and
memory controllers and non-blocking caches.  Finally, Chisel's
functional programming and bit-width inference ideas were inspired by
earlier work on a hardware description language called Gel~\cite{gel} designed in
collaboration with Dany Qumsiyeh and Mark Tobenkin.

% \note{Who else?}

\begin{thebibliography}{50}
\bibitem{chisel-dac12} Bachrach, J., Vo, H., Richards, B., Lee, Y., Waterman,
  A., Avi\v{z}ienis, Wawrzynek, J., Asanovi\'{c} \textsl{Chisel:
    Constructing Hardware in a Scala Embedded Language}.
in DAC '12.
\bibitem{gel} Bachrach, J., Qumsiyeh, D., Tobenkin, M. \textsl{Hardware Scripting in Gel}.
in Field-Programmable Custom Computing Machines, 2008. FCCM '08. 16th.
\end{thebibliography}

\end{document}
